\section{Our method}

Our method uses a \emph{break table} just like the method by Haddon
and Waite \cite{Haddon:1967}, but instead of building, moving, and
sorting the table while objects are moved, we first move the objects
and then construct the table.  For that, additional space in the form
of a bitmap is required.  The bitmap has one bit per word of
memory in the heap, which amounts to less than $2$\% additional memory
on a $64$-bit machine.  The bits in the bitmap are set by the
\emph{mark} phase of the garbage collector.

\refFig{fig-example-a} shows a heap in which shaded areas indicate
live objects and white areas indicate dead objects.  The heap contains
16 word as shown by the addresses.  At the bottom of the figure is
shown the bitmap after the mark phase (phase 1) is complete.

\begin{figure}
\begin{center}
\inputfig{fig-example-a.pdf_t}
\end{center}
\caption{\label{fig-example-a}
Example of initial heap.}
\end{figure}

In phase 2, the heap is compacted by sliding the live objects to the
beginning of the heap.  In this phase, two pointers are used, a
\emph{source} pointer pointing to words containing live objects, and a
\emph{destination} pointer pointing to words containing dead objects,
as illustrated by \refFig{fig-example-b}.  Words are copied from the
source location to the destination location.  In each iteration, the
destination location is incremented by one unit, whereas the source
location is incremented until either it reaches the end of the heap,
or a word containing a live object as indicated by the bitmap
containing a $1$.

\begin{figure}
\begin{center}
\inputfig{fig-example-b.pdf_t}
\end{center}
\caption{\label{fig-example-b}
Pointers to source and destination locations.}
\end{figure}

\refFig{fig-example-c} shows the situation when phase 2 is complete. 

\begin{figure}
\begin{center}
\inputfig{fig-example-c.pdf_t}
\end{center}
\caption{\label{fig-example-c}
Heap after compaction.}
\end{figure}

In phase 3, a \emph{break table} is built at the position of the
destination pointer.  The break table consists of a sequence $a_0,
d_0, a_1, d_1, \ldots, a_n, d_n, a_{n+1}$ of alternating
\emph{addresses} and \emph{deltas}.  The value $a_0$ is always $0$.
The table has an odd number of elements, because it both starts and
ends with an address.  Each $a_i$ (except possibly $a_0$ and
$a_{n+1}$) is the index of the beginning of a zone of dead objects.
Each $d_i$ is the sum of the sizes of the dead zones preceding
$a_{i+1}$.

\refFig{fig-example-d} shows the break table of the example heap in
\refFig{fig-example-a}.

\begin{figure}
\begin{center}
\inputfig{fig-example-d.pdf_t}
\end{center}
\caption{\label{fig-example-d}
Break table.}
\end{figure}

Notice that if both the bottom zone and the top zone of the heap
contain live objects, then our break table may require three additional
words of storage compared to the total number of free words available.
The reason for this additional requirement is that our table contains
\emph{sentinels} (the first two words and the last word) that are not
strictly required, but by including them, we avoid special cases in
our procedure for searching the table.  We can easily make sure that
three additional words are available by triggering a collection when
granting a request for memory would leave fewer than three free words
at the end of the heap.  Haddon and Waite \cite{Haddon:1967} did not
have this luxury, because they assumed an existing mark-and-sweep
collector.  For that reason, their paper contains an extensive
argument that their break table can fit in the available space.

The break table is built by scanning the bitmap from start to end.  In
practice, since the heap is likely to contain fairly large contiguous
zones, the bitmap will contain long runs of $0$s and long runs of
$1$s.  It is therefore advantageous to scan the bitmap a word at a
time, making this phase quite efficient. 

In phase 4, the lower part of the heap is traversed word by word in
order to adjust the pointers according to the contents of the break
table.  For each pointer value $p$, the table is searched for values 
$a_i, d_i, a_{i+1}$ such that $a_i \le p < a_{i+1}$.  The
pointer value $p$ is adjusted by subtracting $d_i$ from it.  To find
the entry, the break table is search using \emph{binary search}.

While the overhead of the binary search may seem unacceptably high,
two properties contribute to keeping this overhead low:

\begin{enumerate}
\item While the break table could contain as many as $N/4$ entries,
  where $N$ is the number of words in the heap, in practice, it
  contains far fewer than that, again because the heap is likely to
  contain relatively few relatively large zones.
\item It is very likely that the entry in the break table required to
  adjust a particular pointer is the same as the entry required to
  adjust the pointer immediately preceding it.  By testing this case
  first, the vast majority of full binary searches can be avoided.
\end{enumerate}
